-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hybrid deployments #563
base: main
Are you sure you want to change the base?
Hybrid deployments #563
Conversation
Working on testing this. I think it "works" but I wanted some feedback of what I'm missing or not implementing correctly. |
- name: Libvirt - Pause for power down | ||
pause: | ||
seconds: 1 | ||
when: not redfish_forceoff.failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be able to use a "check for powered down" type of task here instead of a sleep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to remove this entirely on my tested deployments (3, 27, and 54 VMs)
1731073
to
2d70df6
Compare
re: issue #536 |
faee549
to
9e0ea2c
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: radez The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c656c10
to
22cec71
Compare
- role: boot-iso | ||
vars: | ||
inventory_group: hv_vm | ||
index: "{{ hybrid_worker_count }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of someone running an ACM scale test where all of the hv_vm entries are actually say SNOs, does this task dump thousands of lines of skipped tasks or does it just skip the role? If it dumps thousands of lines, I think we should revisit how this is performed perhaps using a different inventory group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right it does dump thousands of lines. I had looked at adding a loop_var in this at one point to make the output more meaningful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we could instead of including another boot-iso role here over hv_vm workers, maybe we just copy the desired number of hv_vm we want to use under workers instead. I will think of a more automated way to accomplish this as well. WDYT?
- Specify Podman as the deploy type for the bastion AI container example podman configmap: https://github.com/openshift/assisted-service/blob/master/deploy/podman/configmap.yml - no need to patch the cluster network settings after boot All the same settings are defined at cluster creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to start building some clusters where my controlplane nodes were bare metal and my worker nodes were VMs however I have some feedback.
|
||
- name: Libvirt - Check for Virtual Media | ||
uri: | ||
url: "http://{{ hostvars[item]['ansible_host'] }}:9000/redfish/v1/Managers/{{ hostvars[item]['domain_uuid'] }}/VirtualMedia/Cd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to change Managers
to Systems
because Managers didn't work 🤣
|
||
- name: Libvirt - Eject any CD Virtual Media | ||
uri: | ||
url: "http://{{ hostvars[item]['ansible_host'] }}:9000/redfish/v1/Managers/{{ hostvars[item]['domain_uuid'] }}/VirtualMedia/Cd/Actions/VirtualMedia.EjectMedia" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here with Managers
|
||
- name: Libvirt - Insert virtual media | ||
uri: | ||
url: "http://{{ hostvars[item]['ansible_host'] }}:9000/redfish/v1/Managers/{{ hostvars[item]['domain_uuid'] }}/VirtualMedia/Cd/Actions/VirtualMedia.InsertMedia" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here with Managers
- name: Libvirt - Pause for power down | ||
pause: | ||
seconds: 1 | ||
when: not redfish_forceoff.failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to remove this entirely on my tested deployments (3, 27, and 54 VMs)
"additional_ntp_source": "{{ bastion_controlplane_ip if use_bastion_registry else labs[lab]['ntp_server'] }}", | ||
"api_vips": [{"ip": "{{ controlplane_network_api }}"}], | ||
"ingress_vips": [{"ip": "{{ controlplane_network_ingress }}"}], | ||
"network_type": "{{ networktype }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we need to revert this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the root of the original issue that was causing my deployment with this to fail PR. We can leave this as is.
- name: Patch cluster network settings | ||
uri: | ||
url: "http://{{ assisted_installer_host }}:{{ assisted_installer_port }}/api/assisted-install/v2/clusters/{{ ai_cluster_id }}" | ||
method: PATCH | ||
status_code: [201] | ||
return_content: true | ||
body_format: json | ||
body: { | ||
"cluster_networks": [ | ||
{ | ||
"cidr": "{{ cluster_network_cidr }}", | ||
"cluster_id": "{{ ai_cluster_id }}", | ||
"host_prefix": "{{ cluster_network_host_prefix }}" | ||
} | ||
], | ||
"service_networks": [ | ||
{ | ||
"cidr": "{{ service_network_cidr }}", | ||
"cluster_id": "{{ ai_cluster_id }}", | ||
} | ||
] | ||
} | ||
|
||
- name: Patch cluster ingress/api vip addresses | ||
uri: | ||
url: "http://{{ assisted_installer_host }}:{{ assisted_installer_port }}/api/assisted-install/v2/clusters/{{ ai_cluster_id }}" | ||
method: PATCH | ||
status_code: [201] | ||
return_content: true | ||
body_format: json | ||
body: { | ||
"cluster_network_host_prefix": "{{ cluster_network_host_prefix }}", | ||
"vip_dhcp_allocation": "{{ vip_dhcp_allocation }}", | ||
"ingress_vips": [{"ip": "{{ controlplane_network_ingress }}"}], | ||
"api_vips": [{"ip": "{{ controlplane_network_api }}"}], | ||
"network_type": "{{ networktype }}" | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this needs to be reverted as I was unable to make a cluster without including this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok from my findings, we need to just retain lines 58 through 79 so we set cluster_networks and service_networks. I will investigate if we can set these on cluster creation as a follow up.
- role: boot-iso | ||
vars: | ||
inventory_group: hv_vm | ||
index: "{{ hybrid_worker_count }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we could instead of including another boot-iso role here over hv_vm workers, maybe we just copy the desired number of hv_vm we want to use under workers instead. I will think of a more automated way to accomplish this as well. WDYT?
- name: Patch cluster network settings | ||
uri: | ||
url: "http://{{ assisted_installer_host }}:{{ assisted_installer_port }}/api/assisted-install/v2/clusters/{{ ai_cluster_id }}" | ||
method: PATCH | ||
status_code: [201] | ||
return_content: true | ||
body_format: json | ||
body: { | ||
"cluster_networks": [ | ||
{ | ||
"cidr": "{{ cluster_network_cidr }}", | ||
"cluster_id": "{{ ai_cluster_id }}", | ||
"host_prefix": "{{ cluster_network_host_prefix }}" | ||
} | ||
], | ||
"service_networks": [ | ||
{ | ||
"cidr": "{{ service_network_cidr }}", | ||
"cluster_id": "{{ ai_cluster_id }}", | ||
} | ||
] | ||
} | ||
|
||
- name: Patch cluster ingress/api vip addresses | ||
uri: | ||
url: "http://{{ assisted_installer_host }}:{{ assisted_installer_port }}/api/assisted-install/v2/clusters/{{ ai_cluster_id }}" | ||
method: PATCH | ||
status_code: [201] | ||
return_content: true | ||
body_format: json | ||
body: { | ||
"cluster_network_host_prefix": "{{ cluster_network_host_prefix }}", | ||
"vip_dhcp_allocation": "{{ vip_dhcp_allocation }}", | ||
"ingress_vips": [{"ip": "{{ controlplane_network_ingress }}"}], | ||
"api_vips": [{"ip": "{{ controlplane_network_api }}"}], | ||
"network_type": "{{ networktype }}" | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok from my findings, we need to just retain lines 58 through 79 so we set cluster_networks and service_networks. I will investigate if we can set these on cluster creation as a follow up.
"additional_ntp_source": "{{ bastion_controlplane_ip if use_bastion_registry else labs[lab]['ntp_server'] }}", | ||
"api_vips": [{"ip": "{{ controlplane_network_api }}"}], | ||
"ingress_vips": [{"ip": "{{ controlplane_network_ingress }}"}], | ||
"network_type": "{{ networktype }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the root of the original issue that was causing my deployment with this to fail PR. We can leave this as is.
Baremetal control plane and virtual workers.
I think this will also do baremetal control plane and workers and virtual workers.